feat(scheduler): Backend interface + FileBackend + CronJob manifest helpers (#162 part 2)#169
Merged
Merged
Conversation
…elpers (#162 part 2) First half of part 2 of the #162 stack. Introduces the ScheduleBackend abstraction the runtime + 'forge package' both consume, ships the FileBackend wrapping existing behavior with zero observable change, and lands the pure-Go CronJob manifest builders + in-cluster detection helper that parts 2b/3 build on. forge-core/scheduler/backend.go ScheduleBackend interface (Start, Stop, Reload, Sync, List, Get, Set, Delete, History) bundling timing + persistence into one surface. FileBackend wraps Scheduler + ScheduleStore behind the interface. SourceYAML / SourceLLM constants pin the reconciliation rule: Sync upserts yaml-sourced entries, prunes yaml-sourced entries dropped from the manifest, preserves LLM-sourced ones, and carries LastRun / RunCount across re-Sync. forge-core/scheduler/k8s_detect.go InCluster() helper. Default signal is the projected ServiceAccount token at /var/run/secrets/kubernetes.io/serviceaccount/token. FORGE_IN_CLUSTER env var overrides for dev/test. forge-core/scheduler/k8s_manifest.go CronJobName + CronJobYAML — pure-Go manifest builders. K8s-name sanitization (RFC 1123 subset) with hash-suffix when truncating past 63 chars so distinct (agent, schedule) pairs that share a prefix don't collide. CronJob spec uses concurrencyPolicy: Forbid (K8s-native equivalent of FileBackend's overlap skip) and embeds X-Forge-Schedule-Id header (enables schedule_fire / _complete audit-event linkage in part 3). Task text is shell-escaped for the single-quoted JSON-RPC body. forge-core/types/config.go SchedulerConfig + K8sSchedulerConfig blocks on ForgeConfig. Backend: "auto" (default) | "file" | "kubernetes". Kubernetes block fields: Namespace, ServiceURL, AllowDynamic (default false), TriggerImage, AuthSecretName. forge-cli/runtime/runner.go Swap r.sched *Scheduler for r.schedBackend Backend. FileBackend constructed at the existing wiring point with the same Scheduler + store the pre-#162 path used. Zero behavior change. The lazyScheduleReloader delegates Reload through the Backend. Tests: - backend_test.go (5 tests): Sync idempotency, yaml-entry pruning, LLM-entry preservation, per-run state preservation across re-Sync, Store() accessor. - k8s_manifest_test.go (5 tests): CronJob name fits 63 chars, hash-suffix uniqueness on truncation, manifest has required K8s-side fields, defaults applied for missing inputs, single-quote shell escaping, FORGE_IN_CLUSTER env override (both directions). Docs: - docs/deployment/scheduler-kubernetes.md — new operator-facing reference: backend resolution ladder, CronJob shape, token plumbing via forge auth, security model. NOT in this PR (deliberate split): - KubernetesBackend with client-go for runtime CronJob CRUD. The runtime backend interface and helper code shipped here are enough for static (forge.yaml) schedules deployed via 'forge package' in part 3 — the cluster's CronJob controller handles timing without any runtime API calls. The LLM-driven schedule_set path requires client-go and lands as part 2b. Refs #162
Merged
5 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 15, 2026
… CRUD (#162 part 2b) Second half of part 2 of the #162 stack. Builds on the ScheduleBackend interface + FileBackend refactor + manifest helpers shipped in part 2 (PR #169). Adds the real K8s runtime backend, dependency on k8s.io/client-go, and the forge.yaml-driven backend selection in the runner. forge-cli/runtime/scheduler_k8s_backend.go KubernetesBackend implements scheduler.Backend by delegating persistence + timing to the cluster's CronJob controller: - Start/Stop/Reload are no-ops (cluster owns timing) - Sync reconciles cluster CronJobs against declared yaml entries: create, update on drift, prune dropped yaml entries, PRESERVE LLM-sourced entries unconditionally - Set / Delete are gated by AllowDynamic (default false); yaml-sourced CronJobs cannot be deleted via direct Delete (Sync is the only removal path for declarative entries) - List filters by forge.agent.id label so unrelated CronJobs in the namespace don't appear in schedule_list output - History returns empty + warns once; the audit stream's schedule_fire/complete events are the canonical source - CronJobs are constructed in-memory to match the YAML the forge-core scheduler.CronJobYAML emits byte-for-byte, so runtime reconcile doesn't churn against forge package manifests (#162 part 3) - Round-trips Schedule <-> CronJob via labels (agent.id, schedule.id, schedule.source) and annotations (task, skill, channel, channel_target, run_count, last_status) - LastRun read from CronJob.Status.LastScheduleTime - NewKubernetesBackendWithClient testing seam accepts an explicit kubernetes.Interface (fake.Clientset in tests) forge-cli/runtime/runner.go Backend selection wired off forge.yaml scheduler.backend: - "kubernetes" — always K8s; errors at startup when not in-cluster (FORGE_IN_CLUSTER=true overrides for tests) - "file" — always FileBackend - "auto"/"" — K8s when in-cluster, file otherwise Drops the old syncYAMLSchedules helper now that the runner calls Backend.Sync(declaredSchedules()) for both modes. go.mod k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2. Tests (9 cases against fake.Clientset): - Sync creates CronJobs for declared entries with the expected labels + ConcurrencyPolicy=Forbid - Sync is idempotent (no churn on no-op re-run) - Sync updates on cron drift - Sync prunes yaml entries removed from the manifest - Sync preserves LLM-sourced entries on yaml-only re-Sync - Dynamic Set is gated by AllowDynamic with an actionable error referencing the config flag - Dynamic Delete of a yaml-sourced schedule is refused with an error pointing operators at the manifest - List filters by forge.agent.id (unrelated CronJobs in the namespace are not returned) - History returns empty + does not error Docs: - docs/deployment/scheduler-kubernetes.md gains the RBAC table, the annotation round-trip reference, and the "what's not in the K8s backend" section flagging schedule_history deferral + cross-namespace out-of-scope + token rotation as a follow-up. Refs #162
5 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 15, 2026
… CRUD (#162 part 2b) Second half of part 2 of the #162 stack. Builds on the ScheduleBackend interface + FileBackend refactor + manifest helpers shipped in part 2 (PR #169). Adds the real K8s runtime backend, dependency on k8s.io/client-go, and the forge.yaml-driven backend selection in the runner. forge-cli/runtime/scheduler_k8s_backend.go KubernetesBackend implements scheduler.Backend by delegating persistence + timing to the cluster's CronJob controller: - Start/Stop/Reload are no-ops (cluster owns timing) - Sync reconciles cluster CronJobs against declared yaml entries: create, update on drift, prune dropped yaml entries, PRESERVE LLM-sourced entries unconditionally - Set / Delete are gated by AllowDynamic (default false); yaml-sourced CronJobs cannot be deleted via direct Delete (Sync is the only removal path for declarative entries) - List filters by forge.agent.id label so unrelated CronJobs in the namespace don't appear in schedule_list output - History returns empty + warns once; the audit stream's schedule_fire/complete events are the canonical source - CronJobs are constructed in-memory to match the YAML the forge-core scheduler.CronJobYAML emits byte-for-byte, so runtime reconcile doesn't churn against forge package manifests (#162 part 3) - Round-trips Schedule <-> CronJob via labels (agent.id, schedule.id, schedule.source) and annotations (task, skill, channel, channel_target, run_count, last_status) - LastRun read from CronJob.Status.LastScheduleTime - NewKubernetesBackendWithClient testing seam accepts an explicit kubernetes.Interface (fake.Clientset in tests) forge-cli/runtime/runner.go Backend selection wired off forge.yaml scheduler.backend: - "kubernetes" — always K8s; errors at startup when not in-cluster (FORGE_IN_CLUSTER=true overrides for tests) - "file" — always FileBackend - "auto"/"" — K8s when in-cluster, file otherwise Drops the old syncYAMLSchedules helper now that the runner calls Backend.Sync(declaredSchedules()) for both modes. go.mod k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2. Tests (9 cases against fake.Clientset): - Sync creates CronJobs for declared entries with the expected labels + ConcurrencyPolicy=Forbid - Sync is idempotent (no churn on no-op re-run) - Sync updates on cron drift - Sync prunes yaml entries removed from the manifest - Sync preserves LLM-sourced entries on yaml-only re-Sync - Dynamic Set is gated by AllowDynamic with an actionable error referencing the config flag - Dynamic Delete of a yaml-sourced schedule is refused with an error pointing operators at the manifest - List filters by forge.agent.id (unrelated CronJobs in the namespace are not returned) - History returns empty + does not error Docs: - docs/deployment/scheduler-kubernetes.md gains the RBAC table, the annotation round-trip reference, and the "what's not in the K8s backend" section flagging schedule_history deferral + cross-namespace out-of-scope + token rotation as a follow-up. Refs #162
Merged
5 tasks
initializ-mk
added a commit
that referenced
this pull request
Jun 15, 2026
… CRUD (#162 part 2b) Second half of part 2 of the #162 stack. Builds on the ScheduleBackend interface + FileBackend refactor + manifest helpers shipped in part 2 (PR #169). Adds the real K8s runtime backend, dependency on k8s.io/client-go, and the forge.yaml-driven backend selection in the runner. forge-cli/runtime/scheduler_k8s_backend.go KubernetesBackend implements scheduler.Backend by delegating persistence + timing to the cluster's CronJob controller: - Start/Stop/Reload are no-ops (cluster owns timing) - Sync reconciles cluster CronJobs against declared yaml entries: create, update on drift, prune dropped yaml entries, PRESERVE LLM-sourced entries unconditionally - Set / Delete are gated by AllowDynamic (default false); yaml-sourced CronJobs cannot be deleted via direct Delete (Sync is the only removal path for declarative entries) - List filters by forge.agent.id label so unrelated CronJobs in the namespace don't appear in schedule_list output - History returns empty + warns once; the audit stream's schedule_fire/complete events are the canonical source - CronJobs are constructed in-memory to match the YAML the forge-core scheduler.CronJobYAML emits byte-for-byte, so runtime reconcile doesn't churn against forge package manifests (#162 part 3) - Round-trips Schedule <-> CronJob via labels (agent.id, schedule.id, schedule.source) and annotations (task, skill, channel, channel_target, run_count, last_status) - LastRun read from CronJob.Status.LastScheduleTime - NewKubernetesBackendWithClient testing seam accepts an explicit kubernetes.Interface (fake.Clientset in tests) forge-cli/runtime/runner.go Backend selection wired off forge.yaml scheduler.backend: - "kubernetes" — always K8s; errors at startup when not in-cluster (FORGE_IN_CLUSTER=true overrides for tests) - "file" — always FileBackend - "auto"/"" — K8s when in-cluster, file otherwise Drops the old syncYAMLSchedules helper now that the runner calls Backend.Sync(declaredSchedules()) for both modes. go.mod k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2. Tests (9 cases against fake.Clientset): - Sync creates CronJobs for declared entries with the expected labels + ConcurrencyPolicy=Forbid - Sync is idempotent (no churn on no-op re-run) - Sync updates on cron drift - Sync prunes yaml entries removed from the manifest - Sync preserves LLM-sourced entries on yaml-only re-Sync - Dynamic Set is gated by AllowDynamic with an actionable error referencing the config flag - Dynamic Delete of a yaml-sourced schedule is refused with an error pointing operators at the manifest - List filters by forge.agent.id (unrelated CronJobs in the namespace are not returned) - History returns empty + does not error Docs: - docs/deployment/scheduler-kubernetes.md gains the RBAC table, the annotation round-trip reference, and the "what's not in the K8s backend" section flagging schedule_history deferral + cross-namespace out-of-scope + token rotation as a follow-up. Refs #162
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First half of part 2 of the #162 stack. Introduces the `ScheduleBackend` abstraction the runtime and `forge package` both consume, ships the `FileBackend` wrapping existing behavior with zero observable change, and lands the pure-Go CronJob manifest builders + in-cluster detection helper that parts 2b/3 build on.
Scope split inside part 2
`#162 part 2` as originally scoped was "backend interface + impls + auto-detection." The K8s runtime CRUD implementation requires `k8s.io/client-go`, a ~10 MB transitive dependency. To keep this PR reviewable I split:
This PR alone is enough to ship static-schedule K8s deploys end-to-end once part 3 lands — `forge.yaml` schedules become CronJobs at build time, the cluster's CronJob controller handles timing, no runtime API calls needed.
Files
Behavior change
None. `FileBackend` is a structural wrapper around the existing `Scheduler` + `MemoryScheduleStore`. The 30s ticker, the SCHEDULES.md persistence, the overlap-skip behavior, the audit events — all unchanged. The lazyScheduleReloader still works because Backend.Reload delegates through.
The `scheduler` block in forge.yaml is purely additive — the existing `schedules[]` array continues to work as before.
Reconciliation rules pinned by Sync
Test: `TestFileBackend_SyncPreservesLLMSourced` pins the LLM-preservation invariant.
CronJob manifest shape (used by parts 2b + 3)
```yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: forge-<agent_id>-<schedule_id> # 63-char safe, hash-suffix when truncating
labels:
forge.agent.id: <agent_id>
forge.schedule.id: <schedule_id>
forge.schedule.source: yaml | llm
spec:
schedule: ""
concurrencyPolicy: Forbid # K8s-native overlap skip
jobTemplate:
spec:
template:
spec:
containers:
- name: trigger
image: curlimages/curl:8.10.1
env:
- name: FORGE_AUTH_TOKEN
valueFrom: {secretKeyRef: {name: <agent_id>-internal-token, key: token}}
command: ["sh", "-c"]
args:
- curl -sfX POST -H 'Authorization: Bearer $FORGE_AUTH_TOKEN' -H 'X-Forge-Schedule-Id: ' --data '...'
```
`X-Forge-Schedule-Id` is the hook part 3 uses for audit-event linkage (the agent recognizes scheduled fires and emits `schedule_fire` / `schedule_complete` around the normal A2A dispatch).
Test plan
Refs #162